AITopics | comprehension dataset

Collaborating Authors

comprehension dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Information-Theoretic Approach to Analyze NLP Classification Tasks

Wang, Luran, Gales, Mark, Raina, Vatsal

arXiv.org Artificial IntelligenceFeb-1-2024

Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP) tasks take either a single element input or multiple element inputs to predict an output variable, where an element is a block of text. Each text element has two components: an associated semantic meaning and a linguistic realization. Multiple-choice reading comprehension (MCRC) and sentiment classification (SC) are selected to showcase the framework. For MCRC, it is found that the context influence on the output compared to the question influence reduces on more challenging datasets. In particular, more challenging contexts allow a greater variation in complexity of questions. Hence, test creators need to carefully consider the choice of the context when designing multiple-choice questions for assessment. For SC, it is found the semantic meaning of the input text dominates (above 80\% for all datasets considered) compared to its linguistic realisation when determining the sentiment. The framework is made available at: https://github.com/WangLuran/nlp-element-influence

classification, comprehension, dataset, (14 more...)

arXiv.org Artificial Intelligence

2402.00978

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Education > Assessment & Standards > Student Performance (0.53)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Assessing Distractors in Multiple-Choice Tests

Raina, Vatsal, Liusie, Adian, Gales, Mark

arXiv.org Artificial IntelligenceNov-8-2023

Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. However, generating good quality distractors satisfying these criteria is a challenging task for content creators. We propose automated assessment metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options. We assess incorrectness using the classification ability of a binary multiple-choice reading comprehension system. Plausibility is assessed by considering the distractor confidence - the probability mass associated with the distractor options for a standard multi-class multiple-choice reading comprehension system. Diversity is assessed by pairwise comparison of an embedding-based equivalence metric between the distractors of a question. To further validate the plausibility metric we compare against candidate distributions over multiple-choice questions and agreement with a ChatGPT model's interpretation of distractor plausibility and diversity.

dataset, distractor, diversity, (17 more...)

arXiv.org Artificial Intelligence

2311.04554

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre:

Questionnaire & Opinion Survey (0.76)
Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Benchmarks for Pir\'a 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change

Pirozelli, Paulo, José, Marcos M., Silveira, Igor, Nakasato, Flávio, Peres, Sarajane M., Brandão, Anarosa A. F., Costa, Anna H. R., Cozman, Fabio G.

arXiv.org Artificial IntelligenceSep-19-2023

Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pir\'a. By creating these baselines, researchers can more easily utilize Pir\'a as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pir\'a dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have also produced a curated version of the original dataset, where we fixed a number of grammar issues, repetitions, and other shortcomings. Furthermore, the dataset has been extended in several new directions, so as to face the aforementioned benchmarks: translation of supporting texts from English into Portuguese, classification labels for answerability, automatic paraphrases of questions and answers, and multiple choice candidates. The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pir\'a dataset.

brazilian coast, climate change, comprehension dataset, (2 more...)

arXiv.org Artificial Intelligence

2309.10945

Genre: Research Report (0.40)

Industry: Education > Assessment & Standards > Student Performance (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

YORC: Yoruba Reading Comprehension dataset

Aremu, Anuoluwapo, Alabi, Jesujoba O., Adelani, David Ifeoluwa

arXiv.org Artificial IntelligenceSep-14-2023

In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

adelani, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2308.09768

Country:

Asia > Middle East > Israel (0.05)
Africa > Nigeria (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report (0.40)

Industry:

Education > Assessment & Standards > Student Performance (0.84)
Education > Educational Setting > K-12 Education > Secondary School (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Analyzing Multiple-Choice Reading and Listening Comprehension Tests

Raina, Vatsal, Liusie, Adian, Gales, Mark

arXiv.org Artificial IntelligenceJul-3-2023

Multiple-choice reading and listening comprehension tests are an important part of language assessment. Content creators for standard educational tests need to carefully curate questions that assess the comprehension abilities of candidates taking the tests. However, recent work has shown that a large number of questions in general multiple-choice reading comprehension datasets can be answered without comprehension, by leveraging world knowledge instead. This work investigates how much of a contextual passage needs to be read in multiple-choice reading based on conversation transcriptions and listening comprehension tests to be able to work out the correct answer. We find that automated reading comprehension systems can perform significantly better than random with partial or even no access to the context passage. These findings offer an approach for content creators to automatically capture the trade-off between comprehension and world knowledge required for their proposed questions.

comprehension, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.01076

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Education > Assessment & Standards > Student Performance (0.60)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Bojic, Iva, Halim, Josef, Suharman, Verena, Tar, Sreeja, Ong, Qi Chwen, Phung, Duy, Ravaut, Mathieu, Joty, Shafiq, Car, Josip

arXiv.org Artificial IntelligenceMay-26-2023

Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality.

comprehension dataset, data-centric framework, domain-specific machine

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.insights-1.3

2304.00483

Genre: Research Report (0.40)

Industry: Education > Assessment & Standards > Student Performance (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

A Multilingual Modeling Method for Span-Extraction Reading Comprehension

Wu, Gaochen, Xu, Bin, Chang, Dejie, Liu, Bangchang

arXiv.org Artificial IntelligenceMay-31-2021

Span-extraction reading comprehension models have made tremendous advances enabled by the availability of large-scale, high-quality training datasets. Despite such rapid progress and widespread application, extractive reading comprehension datasets in languages other than English remain scarce, and creating such a sufficient amount of training data for each language is costly and even impossible. An alternative to creating large-scale high-quality monolingual span-extraction training datasets is to develop multilingual modeling approaches and systems which can transfer to the target language without requiring training data in that language. In this paper, in order to solve the scarce availability of extractive reading comprehension training data in the target language, we propose a multilingual extractive reading comprehension approach called XLRC by simultaneously modeling the existing extractive reading comprehension training data in a multilingual environment using self-adaptive attention and multilingual attention. Specifically, we firstly construct multilingual parallel corpora by translating the existing extractive reading comprehension datasets (i.e., CMRC 2018) from the target language (i.e., Chinese) into different language families (i.e., English). Secondly, to enhance the final target representation, we adopt self-adaptive attention (SAA) to combine self-attention and inter-attention to extract the semantic relations from each pair of the target and source languages. Furthermore, we propose multilingual attention (MLA) to learn the rich knowledge from various language families. Experimental results show that our model outperforms the state-of-the-art baseline (i.e., RoBERTa_Large) on the CMRC 2018 task, which demonstrate the effectiveness of our proposed multi-lingual modeling approach and show the potentials in multilingual NLP tasks.

cmrc 2018, dataset, xlrc, (15 more...)

arXiv.org Artificial Intelligence

2105.1488

Country:

North America > United States > Texas > El Paso County > El Paso (0.05)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

R3: A Reading Comprehension Benchmark Requiring Reasoning Processes

Wang, Ran, Tao, Kun, Song, Dingjie, Zhang, Zhilong, Ma, Xiao, Su, Xi'ao, Dai, Xinyu

arXiv.org Artificial IntelligenceApr-2-2020

Existing question answering systems can only predict answers without explicit reasoning processes, which hinder their explainability and make us overestimate their ability of understanding and reasoning over natural language. In this work, we propose a novel task of reading comprehension, in which a model is required to provide final answers and reasoning processes. To this end, we introduce a formalism for reasoning over unstructured text, namely Text Reasoning Meaning Representation (TRMR). TRMR consists of three phrases, which is expressive enough to characterize the reasoning process to answer reading comprehension questions. We develop an annotation platform to facilitate TRMR's annotation, and release the R3 dataset, a \textbf{R}eading comprehension benchmark \textbf{R}equiring \textbf{R}easoning processes. R3 contains over 60K pairs of question-answer pairs and their TRMRs. Our dataset is available at: \url{http://anonymous}.

comprehension, natural language processing, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2004.01251

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
(12 more...)

Genre: Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.88)

Add feedback

Large-scale Cloze Test Dataset Created by Teachers

Xie, Qizhe, Lai, Guokun, Dai, Zihang, Hovy, Eduard

arXiv.org Artificial IntelligenceAug-27-2018

Cloze tests are widely adopted in language exams to evaluate students' language proficiency. In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. With missing blanks carefully created by teachers and candidate choices purposely designed to be nuanced, CLOTH requires a deeper language understanding and a wider attention span than previously automatically-generated cloze datasets. We test the performance of dedicatedly designed baseline models including a language model trained on the One Billion Word Corpus and show humans outperform them by a significant margin. We investigate the source of the performance gap, trace model deficiencies to some distinct properties of CLOTH, and identify the limited ability of comprehending the long-term context to be the key bottleneck.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1711.03225

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Education > Curriculum > Subject-Specific Education (0.66)
Education > Educational Setting > K-12 Education (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

QuAC : Question Answering in Context

Choi, Eunsol, He, He, Iyyer, Mohit, Yatskar, Mark, Yih, Wen-tau, Choi, Yejin, Liang, Percy, Zettlemoyer, Luke

arXiv.org Artificial IntelligenceAug-27-2018

We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as we show in a detailed qualitative evaluation. We also report results for a number of reference models, including a recently state-of-the-art reading comprehension architecture extended to model dialog context. Our best model underperforms humans by 20 F1, suggesting that there is significant room for future work on this data. Dataset, baseline, and leaderboard available at http://quac.ai.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

1808.07036

Country:

Europe > United Kingdom > England (0.04)
Europe > Ireland (0.04)
South America > Chile (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education > Assessment & Standards > Student Performance (0.35)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback